The dataset provides comprehensive information on malnutrition levels across countries and UNICEF regions. The data is sourced from the UNICEF website: https://data.unicef.org/resources/dataset/malnutrition-data/
# Import libraries
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
# Load the dataset
excel_url = 'path/malnutrition.xlsx'
malnutrition_df = pd.read_excel(excel_url)
malnutrition_df.head()
| ISO code | Country and areas | Survey year | Year* | United Nations Region | United Nations Sub-Region | SDG Region | UNICEF Region | UNICEF Sub-Region | WHO Region | ... | Stunting | Stunting Footnote | WAZ Survey Sample (N) | Underweight | Underweight Footnote | Fieldwork period | Report Author | Source | Short Source | U5 Population ('000s) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | AFG | AFGHANISTAN | 1997 | 1997 | Asia | Southern Asia | Central Asia and Southern Asia | SA | SA | EMRO | ... | 53.2 | cw 20 | 4846.0 | 44.9 | c | January,1997-May,1997 | CIET International | Afghanistan 1997 multiple indicator baseline (... | MICS | 3706.024902 |
| 1 | AFG | AFGHANISTAN | 2004 | 2004 | Asia | Southern Asia | Central Asia and Southern Asia | SA | SA | EMRO | ... | 59.3 | w 21 | NaN | 31.7 | b | May,2004-June,2004 | Ministry of Public Health (Afghanistan), UNICE... | Summary report of the national nutrition surve... | NNS | 4705.370117 |
| 2 | AFG | AFGHANISTAN | 2013 | 2013 | Asia | Southern Asia | Central Asia and Southern Asia | SA | SA | EMRO | ... | 40.4 | r | 4426469.0 | 24.6 | s | May,2013-October,2013 | Ministry of Public Health, UNICEF and the Aga ... | Afghanistan National Nutrition Survey 2013 | NNS | 5433.032227 |
| 3 | AFG | AFGHANISTAN | 2018 | 2018 | Asia | Southern Asia | Central Asia and Southern Asia | SA | SA | EMRO | ... | 38.2 | r | 19539.2 | 19.1 | s | March,2018-November,2018 | KIT Royal Tropical Institute | Afghanistan Health Survey 2018 | Other | 6147.354980 |
| 4 | ALB | ALBANIA | 2005 | 2005 | Europe | Southern Europe | Northern America and Europe | ECA | EECA | EURO | ... | 26.7 | NaN | 1090.2 | 6.6 | NaN | October,2005-November,2005 | Albanian National Institute of Statistics. | Albania multiple indicator cluster survey 2005... | MICS | 214.872009 |
5 rows × 36 columns
#Find basic info of all columns
malnutrition_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1100 entries, 0 to 1099
Data columns (total 36 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ISO code 1100 non-null object
1 Country and areas 1100 non-null object
2 Survey year 1100 non-null object
3 Year* 1100 non-null int64
4 United Nations Region 1100 non-null object
5 United Nations Sub-Region 1100 non-null object
6 SDG Region 1100 non-null object
7 UNICEF Region 1088 non-null object
8 UNICEF Sub-Region 1088 non-null object
9 WHO Region 1091 non-null object
10 World Bank Income Classification 1079 non-null object
11 World Bank Region 1100 non-null object
12 LDC 404 non-null object
13 LIFD 436 non-null object
14 LLDC or SIDS 389 non-null object
15 UNICEF Survey ID 1100 non-null int64
16 WHO Global Database Number 1100 non-null int64
17 Type of Estimate 1100 non-null object
18 WHZ Survey Sample (N) 982 non-null float64
19 Severe Wasting 897 non-null float64
20 Severe Wasting Footnote 396 non-null object
21 Wasting 1055 non-null float64
22 Wasting Footnote 549 non-null object
23 Overweight 964 non-null float64
24 Overweight Footnote 461 non-null object
25 HAZ Survey Sample (N) 977 non-null float64
26 Stunting 1051 non-null float64
27 Stunting Footnote 466 non-null object
28 WAZ Survey Sample (N) 891 non-null float64
29 Underweight 1067 non-null float64
30 Underweight Footnote 517 non-null object
31 Fieldwork period 957 non-null object
32 Report Author 1098 non-null object
33 Source 1100 non-null object
34 Short Source 1100 non-null object
35 U5 Population ('000s) 1100 non-null float64
dtypes: float64(9), int64(3), object(24)
memory usage: 309.5+ KB
# Find null/NA columns
malnutrition_df.isnull().sum()
ISO code 0
Country and areas 0
Survey year 0
Year* 0
United Nations Region 0
United Nations Sub-Region 0
SDG Region 0
UNICEF Region 12
UNICEF Sub-Region 12
WHO Region 9
World Bank Income Classification 21
World Bank Region 0
LDC 696
LIFD 664
LLDC or SIDS 711
UNICEF Survey ID 0
WHO Global Database Number 0
Type of Estimate 0
WHZ Survey Sample (N) 118
Severe Wasting 203
Severe Wasting Footnote 704
Wasting 45
Wasting Footnote 551
Overweight 136
Overweight Footnote 639
HAZ Survey Sample (N) 123
Stunting 49
Stunting Footnote 634
WAZ Survey Sample (N) 209
Underweight 33
Underweight Footnote 583
Fieldwork period 143
Report Author 2
Source 0
Short Source 0
U5 Population ('000s) 0
dtype: int64
In the dataset, numerous null or NAN values are present. The proposed solution outlined below involves replacing the null values in numeric columns with their respective means. The process begins by grouping the dataframe by country, calculating the mean within each group, and subsequently filling the null values.
For countries with no values across any survey year, the corresponding columns are filled with zeros. This strategy was chosen over utilizing dropna() to avoid the removal of crucial data. The dropna() method resulted in a substantial reduction in the sample size, which was deemed undesirable.
# List of numeric columns
numeric_columns = malnutrition_df.select_dtypes(include='number').columns
# Group by 'Country and areas' and fill NaN values with the mean of each group for numeric columns
malnutrition_df[numeric_columns] = malnutrition_df.groupby(['Country and areas'])[numeric_columns].transform(lambda group: group.fillna(group.mean()))
# Fill remaining NaN values with 0 for numeric columns only
malnutrition_df[numeric_columns] = malnutrition_df[numeric_columns].fillna(0)
# Explore basic statistics
malnutrition_df.describe()
| Year* | UNICEF Survey ID | WHO Global Database Number | WHZ Survey Sample (N) | Severe Wasting | Wasting | Overweight | HAZ Survey Sample (N) | Stunting | WAZ Survey Sample (N) | Underweight | U5 Population ('000s) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 1100.000000 | 1100.000000 | 1100.000000 | 1.100000e+03 | 1100.000000 | 1100.000000 | 1100.000000 | 1.100000e+03 | 1100.000000 | 1.100000e+03 | 1100.000000 | 1100.000000 |
| mean | 2005.846364 | 1366.496364 | 3100.658182 | 2.691202e+09 | 1.859558 | 6.615872 | 6.018644 | 2.709543e+09 | 27.187440 | 5.530074e+09 | 14.956497 | 6052.535110 |
| std | 9.576807 | 1804.161697 | 1689.987859 | 7.336579e+10 | 1.657911 | 4.886439 | 4.380790 | 7.386786e+10 | 15.877532 | 9.685860e+10 | 12.281906 | 16093.963919 |
| min | 1983.000000 | 1.000000 | 23.000000 | 0.000000e+00 | 0.000000 | 0.000000 | 0.000000 | 0.000000e+00 | 0.000000 | 0.000000e+00 | 0.000000 | 1.141000 |
| 25% | 1998.000000 | 302.750000 | 2326.250000 | 2.914700e+03 | 0.700000 | 2.600000 | 2.600000 | 2.916000e+03 | 14.175000 | 3.349700e+03 | 4.300000 | 609.980255 |
| 50% | 2007.000000 | 627.500000 | 3119.500000 | 5.929150e+03 | 1.400000 | 5.500000 | 5.200000 | 5.868750e+03 | 27.150000 | 6.768950e+03 | 12.450000 | 2023.768494 |
| 75% | 2014.000000 | 2038.250000 | 3426.250000 | 1.886365e+04 | 2.575000 | 9.600000 | 8.300000 | 1.811731e+04 | 38.390625 | 2.383295e+04 | 21.900000 | 4265.401245 |
| max | 2022.000000 | 8994.000000 | 9862.000000 | 2.361127e+12 | 12.900000 | 25.300000 | 29.600000 | 2.377285e+12 | 73.600000 | 2.431224e+12 | 66.800000 | 131429.437500 |
Stunting: Stunting is a form of malnutrition characterized by impaired growth and development, particularly in the early years of a child's life..
Wasting: Wasting refers to the rapid and severe loss of weight and muscle mass, often associated with acute malnutriti
Severe Wasting: Severe wasting is an even more critical condition than wasting. It indicates a severe degree of malnutrition, and children with severe wasting have a significantly low weight for their height.ons.
Underweight: Underweight is a general term indicating that an individual's weight is below the expected standard for theihealth.
Overweight: Overweight signifies an excess of body weight relative to height, often resulting from an imbalance between caloric intake and exl activity, and healthy lifestyle habits.
Factors Contributing to Elevated Malnutrition Rates:
Economic Challenges:
Food Insecurity:
Healthcare Disparities:
Prevalence of Infections:
Understanding and addressing these underlying factors are crucial for developing targeted interventions aimed at reducing malnutrition rates in these regions. Comprehensive strategies encompassing economic development, food security measures, improved healthcare access, and infection prevention are essential to mitigate the impact of malnutrition on these communities.
top_countries = malnutrition_df.groupby("Country and areas")["Stunting"].mean().nlargest(10).index
y_values = malnutrition_df.groupby("Country and areas")["Stunting"].mean().round(2).loc[top_countries]
fig = px.bar(malnutrition_df, x=top_countries, y=y_values, color=top_countries, color_discrete_sequence=px.colors.sequential.Cividis)
fig.update_layout(
title="Top 10 Countries with the highest average stunting rates",
xaxis_title="Country name",
yaxis_title="Average stunting rate",
xaxis=dict(tickangle=-45)
)
fig.show()
top_countries_overweight = malnutrition_df.groupby('Country and areas')['Overweight'].mean().nlargest(10).index
y_values_overweight = malnutrition_df.groupby('Country and areas')['Overweight'].mean().round(2).loc[top_countries_overweight]
fig_overweight = px.bar(malnutrition_df, x=top_countries_overweight, y=y_values_overweight,color=top_countries_overweight, color_discrete_sequence=px.colors.sequential.Cividis)
fig_overweight.update_layout(
title="Top 10 Countries with the highest average overweight rates",
xaxis_title="Country name",
yaxis_title="Average Overweight Rate",
xaxis=dict(tickangle=-45),
)
fig_overweight.show()
top_countries_wasting = malnutrition_df.groupby('Country and areas')['Wasting'].mean().nlargest(10).index
y_values_wasting = malnutrition_df.groupby('Country and areas')['Wasting'].mean().round(2).loc[top_countries_wasting]
fig_wasting = px.bar(malnutrition_df, x=top_countries_wasting, y=y_values_wasting,color=top_countries_wasting, color_discrete_sequence=px.colors.sequential.Cividis)
fig_wasting.update_layout(
title="Top 10 Countries with the highest average wasting rates",
xaxis_title="Country name",
yaxis_title="Average Wasting Rate",
xaxis=dict(tickangle=-45),
)
fig_wasting.show()
top_countries_underweight = malnutrition_df.groupby('Country and areas')['Underweight'].mean().nlargest(10).index
y_values_underweight = malnutrition_df.groupby('Country and areas')['Underweight'].mean().round(2).loc[top_countries_underweight]
fig_underweight = px.bar(malnutrition_df,x=top_countries_underweight,y=y_values_underweight,color=top_countries_underweight, color_discrete_sequence=px.colors.sequential.Cividis)
fig_underweight.update_layout(
title="Top 10 Countries with the highest average underweight rates",
xaxis_title="Country name",
yaxis_title="Average Underweight Rate",
xaxis=dict(tickangle=-45),
)
fig_underweight.show()
Malnutrition is a serious problem that can have a number of negative consequences, including stunted growth, weakened immune systems, and increased vulnerability to disease. It is important to address the root causes of malnutrition in order to improve the lives of people in these countries.
# Calculate the average of the four columns
malnutrition_df['Overall malnutrition rate'] = malnutrition_df[['Overweight', 'Underweight', 'Wasting', 'Stunting']].mean(axis=1).round(2)
# Find the top 10 countries with the highest average
top_countries_average = malnutrition_df.groupby('Country and areas')['Overall malnutrition rate'].mean().nlargest(5).index
y_values_average = malnutrition_df.groupby('Country and areas')['Overall malnutrition rate'].mean().round(2).loc[top_countries_average]
fig_average = px.bar(
malnutrition_df,
y=top_countries_average,
x=y_values_average,
color=top_countries_average,
orientation='h',
color_discrete_sequence=px.colors.sequential.Cividis
)
fig_average.update_layout(
title="Top 5 Countries with the highest average malnutrition rates",
yaxis_title="Country name",
xaxis_title="Average Malnutrition Rate",
)
fig_average.show()
# Create a choropleth map based on the average values
fig = px.choropleth(
malnutrition_df,
locations='ISO code',
color='Overall malnutrition rate',
hover_name='Country and areas',
title='Average Malnutrition Indicator Across Countries',
color_continuous_scale=px.colors.sequential.Cividis,
labels={'Average': 'Average Malnutrition Indicator'}
)
fig.show()
As you can see, there is a strong correlation between income and malnutrition. The lower the income, the higher the malnutrition rate. This is a complex issue with no easy solutions, but it is important to be aware of the problem and to work towards solutions.
import plotly.express as px
# Calculate the average malnutrition rate for each income classification
avg_malnutrition_by_income = malnutrition_df.groupby('World Bank Income Classification')['Overall malnutrition rate'].mean().round(2).reset_index()
fig_pie_avg_by_income = px.pie(
avg_malnutrition_by_income,
names='World Bank Income Classification',
values='Overall malnutrition rate',
title='Average Malnutrition Rate by World Bank Income Classification',
color_discrete_sequence=px.colors.sequential.Cividis
)
fig_pie_avg_by_income.show()
The following plot chart is a line graph that shows the trends in malnutrition rates over time for five different malnutrition categories: underweight, overweight, severe wasting, wasting, and stunting. The data is from 1983 to 2022.
malnutrition_cols = ['Severe Wasting', 'Wasting', 'Overweight', 'Stunting', 'Underweight']
fig = px.area(
malnutrition_df,
x='Year*',
y=malnutrition_cols,
title='Malnutrition Trends Over Time',
labels={'value': 'Malnutrition Rates'},
color_discrete_sequence=px.colors.sequential.Magma,
)
fig.update_layout(
xaxis=dict(title='Year'),
yaxis=dict(title='Malnutrition Rates'),
xaxis_rangeslider_visible=True,
).show()
Overall, the correlation matrix suggests that there is a strong relationship between different forms of malnutrition. This suggests that interventions to reduce malnutrition need to be comprehensive and address the underlying causes of multiple forms of malnutrition.
heatmap_data = malnutrition_df[['Severe Wasting', 'Wasting', 'Overweight', 'Stunting', 'Underweight']]
fig = px.imshow(heatmap_data.corr(), x=heatmap_data.columns, y=heatmap_data.columns,
labels=dict(color='Correlation'), color_continuous_scale='Blues')
fig.update_layout(title='Correlation Heatmap of Malnutritional Indicators')
fig.show()
The following scatter matrix provides in-depth insights into the correlation between the given variables, offering a comprehensive view of malnutrition indicators across World Bank Regions.
fig = px.scatter_matrix(
malnutrition_df,
dimensions=['Severe Wasting', 'Wasting', 'Overweight', 'Stunting'],
color='World Bank Region',
title='Scatter Matrix of Malnutrition Indicators',
color_discrete_sequence=px.colors.sequential.Plasma,
height=1000,
width=1000
)
fig.show()
The graph illustrates that Southern Asia has the highest rates of stunting, severe wasting, overweight, and wasting, with Sub-Saharan Africa following closely. In terms of overweight prevalence, Eastern and Central Africa (ECA) takes the lead, followed by the Middle East and North Africa (MENA).
fig = px.bar(
malnutrition_df.groupby('UNICEF Region')[malnutrition_cols].mean().round(2).reset_index().melt(id_vars='UNICEF Region', var_name='Malnutrition Type', value_name='Malnutrition Rate'),
x='UNICEF Region',
y='Malnutrition Rate',
color='Malnutrition Type',
barmode='group',
title='Malnutrition Rates by UNICEF Region',
color_discrete_sequence=px.colors.sequential.Cividis
)
# Show the plot
fig.show()
# Convert 'U5 Population (\'000s)' to millions
malnutrition_df['U5 Population (Millions)'] = malnutrition_df['U5 Population (\'000s)'] / 1000
# Group by SDG Region and calculate the sum of U5 Population in millions
total_sdg_population = malnutrition_df.groupby('SDG Region')['U5 Population (Millions)'].sum()
# Find the top 5 SDG Regions with the highest U5 Population
top_regions_df = malnutrition_df[malnutrition_df['SDG Region'].isin(total_sdg_population.nlargest(5).index)]
# Calculate the average malnutrition rate for each region
avg_malnutrition = top_regions_df.groupby('SDG Region')[malnutrition_cols + ['Overall malnutrition rate']].mean().reset_index()
# Create and show the table using Plotly
table = go.Figure(data=[go.Table(
header=dict(values=['SDG Region'] + ['U5 Population (Millions)'] + malnutrition_cols + ['Overall malnutrition rate']),
cells=dict(values=[avg_malnutrition['SDG Region']]
+ [total_sdg_population.loc[avg_malnutrition['SDG Region']].round(2).tolist()]
+ [avg_malnutrition[col].round(2).tolist() for col in malnutrition_cols
+ ['Overall malnutrition rate']])
)]).show()
Southern Asia exhibits the highest prevalence of both wasting and underweight. A consistent correlation is observed between wasting and underweight, indicating that nations with elevated wasting rates often experience concurrently high underweight rates. Nonetheless, certain exceptions exist within this pattern.
fig = px.scatter(
malnutrition_df,
x='Wasting',
y='Underweight',
color='United Nations Sub-Region',
size='U5 Population (\'000s)',
title='Correlation between Wasting and Underweight',
labels={'Wasting': 'Wasting', 'Underweight': 'Underweight'},
hover_data=['Country and areas', 'Survey year']
)
fig.show()
Southern Asia exhibits the highest prevalence of both Underweight and Stunting. A consistent correlation is observed between Underweight and Stunting, indicating that nations with elevated Underweight rates often experience concurrently high Stunting rates. Nonetheless, certain exceptions exist within this pattern.
fig = px.scatter(
malnutrition_df,
x='Underweight',
y='Stunting',
color='United Nations Sub-Region',
size='U5 Population (\'000s)',
title='Correlation between Underweight and Stunting',
labels={'Underweight': 'Underweight', 'Stunting': 'Stunting'},
hover_data=['Country and areas', 'Survey year']
)
fig.show()
Southern Asia exhibits the highest prevalence of both Wasting and Stunting. A consistent correlation is observed between Wasting and Stunting, indicating that nations with elevated Wasting rates often experience concurrently high Stunting rates. Nonetheless, certain exceptions exist within this pattern.
fig = px.scatter(
malnutrition_df,
x='Wasting',
y='Stunting',
color='United Nations Sub-Region',
size='U5 Population (\'000s)',
title='Correlation between Wasting and Stunting',
labels={'Severe Wasting': 'Wasting', 'Stunting': 'Stunting'},
hover_data=['Country and areas', 'Survey year']
)
fig.show()